New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowpark integration: Add support for uncollected dataframes in st.map and improve st.table data collection #5590
Conversation
3c17c77
to
5548d42
Compare
…p and improve st.table data collection
a546c77
to
f226b24
Compare
lib/streamlit/elements/map.py
Outdated
return json.dumps(_DEFAULT_MAP) | ||
|
||
if hasattr(data, "empty"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified a bit:
if hasattr(data, "empty") and data.empty:
return json.dumps(_DEFAULT_MAP)
And I think we can remove the ignore and the comment from harahu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so version with and
requires # type: ignore
, which makes @harahu comment completely up to date and valid I think 🤔
Also you meant and
right? Not &
operator?
I've updated the PR with valid example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes, meant and
. Ok, in that case we can leave the comment. I that hasattr(data, "empty")
would give enough hints to the typing tool for this 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good 👍 I added a few comments
@@ -186,7 +194,7 @@ def to_deckgl_json(data: Data, zoom: Optional[int]) -> str: | |||
# the empty attribute. This is either a bug, or the documented data type | |||
# is too broad. One or the other should be addressed, and the ignore | |||
# statement removed. | |||
if data[lon].isnull().values.any() or data[lat].isnull().values.any(): # type: ignore[index] | |||
if data[lon].isnull().values.any() or data[lat].isnull().values.any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If we can remove the ignore statement here, we can also remove the comment from harahu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's removed 👍
lib/streamlit/type_util.py
Outdated
if df.shape[0] == MAX_UNEVALUATED_DF_ROWS: | ||
if ( | ||
is_type(df, _SNOWPARK_DF_TYPE_STR) | ||
and not isinstance(df, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the and not isinstance(df, list)
is not really necessary here, since if it is of type _SNOWPARK_DF_TYPE_STR
or _SNOWPARK_TABLE_TYPE_STR
it anyways cannot be a list. An alternative might be something like:
is_snowpark_dataframe(df) and not isinstance(df, list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's updated now 👍
lib/streamlit/type_util.py
Outdated
st.caption( | ||
f"⚠️ Showing only 10k rows. Call `collect()` on the dataframe to show more." | ||
f"⚠️ Showing only {'10k' if max_unevaluated_rows == MAX_UNEVALUATED_DF_ROWS else str(max_unevaluated_rows)} rows. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: at this point, it might be worth adding a function to string_util
that automatically does this, e.g.:
def simplify_number(num: int) -> str:
num_converted = float("{:.2g}".format(num))
magnitude = 0
while abs(num_converted) >= 1000:
magnitude += 1
num_converted /= 1000.0
return "{}{}".format(
"{:f}".format(num_converted).rstrip("0").rstrip("."),
["", "k", "m", "b", "t"][magnitude],
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your func is created in string_util
and units tests are added 👍 Thank you 🙇
lib/streamlit/elements/map.py
Outdated
if data.empty: # type: ignore[union-attr] | ||
return json.dumps(_DEFAULT_MAP) | ||
|
||
if type_util.is_snowpark_dataframe(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually can just always call convert_anything_to_df
here without the is_snowpark_dataframe
check. I don't think there is any reason why map
should not have the same support for dataframe-like input types as our other elements. In that case, the data = pd.DataFrame(data)
in line 200 would need to be removed and the docstring would need to be updated to cover the same types as st.dataframe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we call always convert_anything_to_df
👍
lib/streamlit/type_util.py
Outdated
@@ -241,9 +242,11 @@ def is_dataframe_like(obj: object) -> TypeGuard[DataFrameLike]: | |||
|
|||
|
|||
def is_snowpark_dataframe(obj: object) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe it is better to rename this to be a bit more generic - e.g. something like is_snowpark_data_object
- since it is not anymore just about the actual snowpark dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's renamed 👍
@LukasMasuch thank you for your review 🙇 I've updated the code accordingly 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
lib/streamlit/type_util.py
Outdated
"""Try to convert different formats to a Pandas Dataframe. | ||
|
||
Parameters | ||
---------- | ||
df : ndarray, Iterable, dict, DataFrame, Styler, pa.Table, None, dict, list, or any | ||
|
||
max_unevaluated_rows: int, If unevaluated data is detected this func will evaluate it, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you need to put the description here in a new line.
lib/streamlit/type_util.py
Outdated
st.caption( | ||
f"⚠️ Showing only 10k rows. Call `collect()` on the dataframe to show more." | ||
f"⚠️ Showing only {string_util.simplify_number(max_unevaluated_rows)} rows. " | ||
f"Call `collect()` on the dataframe to show more." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you can remove the f
here in the second line (since there are no variables to replace)
📚 Context
This PR improves Snowpark integration in Streamlit, it introduces changes like:
st.map
componentsnowflake.snowpark.table.Table
) in Streamlit components, before only unevaluated Dataframes were collectedst.table
to 100 rows, instead of 10k rows. When 10k inst.table
was collected it was buggy, page took long to load, app sometimes crashed and the page was very very long (it was discussed with @jrieke)What kind of change does this PR introduce?
🧠 Description of Changes
Add bullet points summarizing your changes here
Revised:
Insert screenshot of your updated UI/code here
Current:
Insert screenshot of existing UI/code here
🧪 Testing Done
🌐 References
Does this depend on other work, documents, or tickets?
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.